R-Ladies MTL October 2023 Meetup
2023-10-30
Welcome! Today’s agenda:
6:00 - 6:15: Settling in
6:15 - 6:30: Introductions and call for presenters!
6:30 - 7:30: “R-omics”: Using R for bioinformatics
- Includes 20+ min break for exercises
7:30 - 8:00: R-help/networking period
R Ladies Global seeks to achieve proportionate representation by encouraging, inspiring, and empowering women and gender minorities currently underrepresented in the R community
We focus on the R language/environment (but if you want to talk about python or SAS, you won’t be banned)
All members must follow the RLadies Code of Conduct
Slides available from (link also in Meetup chat): https://github.com/rladies/meetup-presentations_montreal/20231030-using-r-for-bioinformatics
Scripts for exercises and demo also available.

Bioinformatics is broad and rapidly evolving
Selection of tools and/or languages depends on application
Applications with a Graphical User Interface are common (e.g., Clustal for alignment)
As are command-line tools
Install Bioconductor’s package manager
Search for available packages
# List all packages
BiocManager::available()
# List only packages containing "sapiens" in the name
BiocManager::available(pattern = "sapiens")You can also use https://www.bioconductor.org/packages/.
DNAString, AAString)| Biostrings Function | Purpose |
|---|---|
translate() |
Translate DNA/RNA to amino acids |
|
Get the (reversed) complementary sequence of base pairs |
alphabetFrequency() |
Count the frequency of each letter |
matchPattern() |
Finds matching occurrences of a pattern |
Install the Bioconductor package Biostrings .
Store at least 2 DNA sequences in a DNAStringSet object.
Translate your DNA into amino acids.
Bonus 1: Try incorporating ambiguous bases in your DNA. Test different if.fuzzy.codon options of translate().
Count the number of methionine (M) residues in each sequence.
Bonus 2: Summarize the distribution across all of your sequences.
https://www.ddbj.nig.ac.jp/ddbj/code-e.html
Bioinformatics data often come in specialized file formats
Common sequence data formats include:
FASTA
FASTQ - contains quality information
BAM
SAM - contains alignment to reference sequence
samtools, seqinr, ShortReads are helpful packages for working with files
Many useful tools are used from the command-line
First 10 lines of output:
# BLASTP 2.9.0+
# Query: sars_partial
# RID: KZSA0UTY016
# Database: nr
# Fields: query acc.ver, subject acc.ver, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 500 hits found
sars_partial 7WHC_A 100.000 70 0 0 1 70 1 70 3.24e-44 151
sars_partial 7UJU_A 100.000 70 0 0 1 70 2 71 3.52e-44 151
sars_partial 7UJ9_A 100.000 70 0 0 1 70 1 70 4.77e-44 151aa_seq <- "SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDVVYCPRHVICTSEDMLNPNYEDLLIRKSNHNFLVQA"
names(aa_seq) <- "sars_partial"
query_file <- "aa_seq.fasta"
out_file <- "results.txt"
# Write sequence as a FASTA file
seqinr::write.fasta(aa_seq, names(aa_seq), file.out = my_file)
# Construct call
blast_call <- paste0("blastp -query ", query_file, " -remote -db nr -out ", out_file, " -outfmt 7 -evalue 1e-30")
# Execute
system(blast_call)Note: Command-line tools (e.g., blastp) need to be installed and configured before use.
E.g., gene expression data from microarrays or RNAseq
Presents special challenges for analysis and visualization
Spatial biology platforms map gene expression over tissue coordinates
10X Visium counts mRNA for each gene in each 50 micron spot on the grid
See seurat_demo.R to follow along
Based on Seurat vignette/tutorial available at: